AITopics | scientific claim verification

Collaborating Authors

scientific claim verification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MuSciClaims: Multimodal Scientific Claim Verification

Lal, Yash Kumar, Bandham, Manikanta, Hasan, Mohammad Saqib, Kashi, Apoorva, Koupaee, Mahnaz, Balasubramanian, Niranjan

arXiv.org Artificial IntelligenceJul-31-2025

Assessing scientific claims requires identifying, extracting, and reasoning with multimodal data expressed in information-rich figures in scientific literature. Despite the large body of work in scientific QA, figure captioning, and other multimodal reasoning tasks over chart-based data, there are no readily usable multimodal benchmarks that directly test claim verification abilities. To remedy this gap, we introduce a new benchmark MuSciClaims accompanied by diagnostics tasks. We automatically extract supported claims from scientific articles, which we manually perturb to produce contradicted claims. The perturbations are designed to test for a specific set of claim verification capabilities. We also introduce a suite of diagnostic tasks that help understand model failures. Our results show most vision-language models are poor (~0.3-0.5 F1), with even the best model only achieving 0.72 F1. They are also biased towards judging claims as supported, likely misunderstanding nuanced perturbations within the claims. Our diagnostics show models are bad at localizing correct evidence within figures, struggle with aggregating information across modalities, and often fail to understand basic components of the figure.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.04585

Country:

North America > United States > Washington > King County > Seattle (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Wang, Chengye, Shen, Yifei, Kuang, Zexi, Cohan, Arman, Zhao, Yilun

arXiv.org Artificial IntelligenceJun-19-2025

We introduce SciVer, the first benchmark specifically designed to evaluate the ability of foundation models to verify claims within a multimodal scientific context. SciVer consists of 3,000 expert-annotated examples over 1,113 scientific papers, covering four subsets, each representing a common reasoning type in multimodal scientific claim verification. To enable fine-grained evaluation, each example includes expert-annotated supporting evidence. We assess the performance of 21 state-of-the-art multimodal foundation models, including o4-mini, Gemini-2.5-Flash, Llama-3.2-Vision, and Qwen2.5-VL. Our experiment reveals a substantial performance gap between these models and human experts on SciVer. Through an in-depth analysis of retrieval-augmented generation (RAG), and human-conducted error evaluations, we identify critical limitations in current open-source models, offering key insights to advance models' comprehension and reasoning in multimodal scientific literature tasks.

computational linguistic, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2506.15569

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Singapore (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SciClaimHunt: A Large Dataset for Evidence-based Scientific Claim Verification

Kumar, Sujit, Sharma, Anshul, Khincha, Siddharth Hemant, Shroff, Gargi, Singh, Sanasam Ranbir, Mishra, Rahul

arXiv.org Artificial IntelligenceFeb-14-2025

Verifying scientific claims presents a significantly greater challenge than verifying political or news-related claims. Unlike the relatively broad audience for political claims, the users of scientific claim verification systems can vary widely, ranging from researchers testing specific hypotheses to everyday users seeking information on a medication. Additionally, the evidence for scientific claims is often highly complex, involving technical terminology and intricate domain-specific concepts that require specialized models for accurate verification. Despite considerable interest from the research community, there is a noticeable lack of large-scale scientific claim verification datasets to benchmark and train effective models. To bridge this gap, we introduce two large-scale datasets, SciClaimHunt and SciClaimHunt_Num, derived from scientific research papers. We propose several baseline models tailored for scientific claim verification to assess the effectiveness of these datasets. Additionally, we evaluate models trained on SciClaimHunt and SciClaimHunt_Num against existing scientific claim verification datasets to gauge their quality and reliability. Furthermore, we conduct human evaluations of the claims in proposed datasets and perform error analysis to assess the effectiveness of the proposed baseline models. Our findings indicate that SciClaimHunt and SciClaimHunt_Num serve as highly reliable resources for training models in scientific claim verification.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.10003

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
Europe > Greece > Central Macedonia > Thessaloniki (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Media > News (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

SciFact-Open: Towards open-domain scientific claim verification

Wadden, David, Lo, Kyle, Kuehl, Bailey, Cohan, Arman, Beltagy, Iz, Wang, Lucy Lu, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceOct-25-2022

While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature. Moving to this open-domain evaluation setting, however, poses unique challenges; in particular, it is infeasible to exhaustively annotate all evidence documents. In this work, we present SciFact-Open, a new test collection designed to evaluate the performance of scientific claim verification systems on a corpus of 500K research abstracts. Drawing upon pooling techniques from information retrieval, we collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models. We find that systems developed on smaller corpora struggle to generalize to SciFact-Open, exhibiting performance drops of at least 15 F1. In addition, analysis of the evidence in SciFact-Open reveals interesting phenomena likely to appear when claim verification systems are deployed in practice, e.g., cases where the evidence supports only a special case of the claim. Our dataset is available at https://github.com/dwadden/scifact-open.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.13777

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LongChecker: Improving scientific claim verification by modeling full-abstract context

Wadden, David, Lo, Kyle, Wang, Lucy Lu, Cohan, Arman, Beltagy, Iz, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceDec-2-2021

We introduce the LongChecker system for scientific claim verification. Given a scientific claim and an evidence-containing research abstract, LongChecker predicts a veracity label and identifies supporting rationales in a multitask fashion based on a shared encoding of the claim and abstract. We perform experiments on the SciFact dataset, and find that LongChecker achieves state-of-the-art performance. We conduct analysis to understand the source of this improvement, and find that identifying the relationship between a claim and a rationale reporting a scientific finding often requires understanding the context in which the rationale appears. By making labeling decisions based on all available context, LongChecker achieves better performance on cases requiring this type of understanding. In addition, we show that LongChecker is able to leverage weakly-supervised in-domain data to facilitate few-shot domain adaptation for scientific claim verification.

dataset, rationale, scientific claim verification, (14 more...)

arXiv.org Artificial Intelligence

2112.0164

Country:

North America > United States > Washington > King County > Seattle (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > Experimental Study (0.69)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback